Alkadhim Journal for Computer Science

1

Analyzing Big Data to Mitigate Cyber Attacks Using Machine Learning Classifiers: A Comparative Study of Efficient Classifiers

Pages: 1-21 English

Abstract

This article discusses a set of modern approaches and techniques in the field of cyber security and big data analysis using machine learning methods, with the aim of improving the ability to detect attacks and mitigate their impact. With the advancement of technology and the world entering a new phase of digital transformation, data and information are increasing in large, uncontrolled quantities, and are vulnerable to corruption and loss. It has become impossible to rely on old methods to preserve the enormous volume of data. The methodology relies on data analysis through sequential stages based on segmentation, classification, and monitoring of network anomalies. It also reduces the size of redundant features to speed up the process of analyzing big data, focusing on attempts to hack operating systems and security systems. Principal Component Analysis (PCA) was applied to reduce the data dimensions and identify influential features, facilitating the training of classifiers. The experiment included the use of multiple binary classifiers, including k-Nearest Neighbours (k-NN), Support Vector Machines (SVM), Bayesian algorithms, and artificial neural networks (ANNs), due to their high efficiency in monitoring network disturbances. The classifiers were combined using two methods: soft voting and majority voting, to achieve higher performance and better detection accuracy. Two data processing methods were applied: the first was to divide the data into subsets, each with its own processing path, and the second was to use sensors that collect and analyse user data via parallel paths to detect anomalies. An experiment was conducted on Internet of Things (IoT) device data, where the combined classifiers (such Support Vector (SV), Weighted Voting (WV), and Majority Voting (MV)) demonstrated higher performance than the individual classifiers, with the static SV classifier achieving a 2.5% increase in classification accuracy (ACC) compared to the best baseline classifier. The results confirm that systematically combining classifiers enhances the effectiveness of cyber detection systems in large and complex data environments.

Keywords

Analysis Big data Cyber Security Machine Learning.

2

Deep Learning-Based Blood Cell Classification with Enhanced Data Preprocessing and Augmentation

Pages: 22-36 English

Full Text Issues

Abstract

Classification of blood cells accurately is an extremely important task in the field of hematology, for the diagnosis of blood disorders and for guiding decision making in the clinical context. In this paper, we use a high-quality dataset of 17,092 microscopic peripheral blood cell images from the Hospital Clinic of Barcelona, encompassing eight different cell types, all of which have been annotated by expert pathologists. To improve model performance and to tackle the class imbalance in the dataset, we developed a strong data preprocessing and data augmentation pipeline which includes contrast enhancement, normalization, geometric and photometric transformations, injection of noise, and mixup style synthetic data. We develop two state-of-the-art deep learning models (EfficientNet-B0, ResNet50) to enable benchmarking of the proposed pipeline. In our experimental results, EfficientNet-B0 achieved overall accuracy of approximately 98.3% and ResNet50 achieved accuracy of 98.6%, with very good precision, recall, and F1-scores for all classes. These preliminary results demonstrate the effectiveness of the designed data preprocessing and data augmentation strategies, as well as provide a benchmark for managing blood cell images in hematology for future research.

Keywords

classification deep learning EfficientNet-B0 ResNet50

3

The Connected Classroom: Leveraging EdTech to Enhance Student Engagement

Pages: 37-54 English

Full Text Issues

Abstract

The effective integration of technology in higher education is crucial for fostering key student learning outcomes, yet empirical evidence from diverse contexts remains limited. This study used a mixed-methods approach to examine how university students think technology aids involvement, teamwork, and independence in learning. Data were gathered from 120 students (65 females and 55 males) at three Iraqi universities—Al-Mustansiriya University, the University of Diyala, and the University of Wasit—using surveys, classroom observations, and open-ended responses. The results showed generally good feelings toward technology integration. Involvement got the highest rating (M = 4.12), then teamwork (M = 3.98), and independence (M = 3.85). Statistical analyses showed differences in teamwork scores: female students reported higher teamwork than male students (t (118) = 2.04, p = .04), and students from Al-Mustansiriya University had higher teamwork than those from the University of Diyala and the University of Wasit (F(2,117) = 3.67, p = .028). There were no differences found for involvement or independence. Observation results pointed to strong peer teamwork, while feedback pointed to both opportunities (like learning flexibility) and problems (like communication issues, limited digital skills). In general, the study suggests that technology can improve student involvement, but its impact on teamwork is affected by demographic and school traits. These results show that careful teaching design and school support are needed to increase the equal upsides of technology-enhanced learning.

Keywords

Collaboration Educational Technology learning environment student autonomy Student engagement

4

Optimized Hybrid CNN-LSTM Framework with Multi-Feature Analysis and SMOTE for Intrusion Detection in SDN

Pages: 55-75 English

Full Text Issues

Abstract

There is a growing trend toward the use of software-defined networks (SDN), which presents new security challenges requiring advanced intrusion detection systems (IDS). This paper proposes a deep learning-based hybrid system combining convolutional neural networks (CNNs) and long-term short-term memory networks (LSTMs) that can be used for effective intrusion detection in SDN environments. The model uses CNNs to extract spatial features from network traffic data and LSTMs to learn temporal patterns, enabling the identification of complex attack patterns. We evaluate our model using an In SDN dataset and test its performance using various feature sets, ranging from 6 to 83 features in our model. Experimental results indicate that our model has a high multi-class classification accuracy of 99.63% when using all 83 features in Group 1. Furthermore, we utilize the Synthetic Minority Over-sampling Technique (SMOTE) to address the issue of class imbalance which considerably enhances the detection accuracy of minority attack classes which is reaching 99.76%. It is established that the presented hybrid CNN-LSTM model is a powerful and effective solution for improving SDN security.

Keywords

CNN deep learning Feature Selection Intrusion Detection System LSTM SMOTE Software-Defined Networking

5

Computationally Efficient Hybrid Framework for MNIST Handwritten Digit Recognition

Pages: 76-86 English

Full Text Issues

Abstract

Handwritten digit recognition remains one of most
fundamental issues in computer vision. While deep
learning (DL) models, particularly Convolutional Neural
Networks (CNNs), have achieved state-of-the-art
accuracy on benchmarks, e.g., MNIST, the
computational cost of such architectures makes them
unsuitable for use in resource-constrained environments. This research proposes a novel computationally efficient
hybrid framework that strategically combines Principal
Component Analysis (PCA), K-Means clustering and a
Random Forest (RF) classifier. The main novelty is to
add K Means cluster labels to the feature set that has
been decreased by PCA to augment the feature space and
improve the discrimination of visually similar digits. In
practice, our method reduces the original 784
dimensional pixel space to 50 dimensions using PCA. The following integration of cluster information provides
the RF classifier with latent structure patterns that
significantly increase its discriminative power. Experimental tests of the MNIST dataset show a robust
classification accuracy of 93.1% notably at the cost of a
significantly lower computational footprint—training is
done in 45 seconds and the prediction takes only 0.05
seconds per image. These results confirm that strategic
combination of dimensionality reduction, unsupervised
feature augmentation and ensemble learning could offer a
highly efficient and effective substitute to DL models for
image classification tasks carried out in resource-limited
environments. The research prioritizes the efficiency of
use and practical application of computing resources at
the expense of achieving the highest possible accuracy.

Keywords

Feature Augmentation. Handwritten digit Kmeans MNIST PCA Random Forest Recognition

6

Practical Implementation of Software Metrics to improve Maintainability in Open-Source Systems

Pages: 87-99 English

Full Text Issues

Abstract

Common software metrics and maintainability measures in open-source Java are examined in this research. The authors test the major object-oriented metrics—Coupling Between Objects (CBO), Lines of Code (LOC), Weighted Methods per Class (WMC), Lack of Cohesion of Methods (LCOM), Depth of Inheritance Tree (DIT), and Cyclomatic Complexity—against real-world maintainability indicators like bug counts, code modifications, and developer turnover. There is currently no workable, repeatable, and experimentally verified process that combines measurement analysis with actual maintainability indications in freely available systems, although a wealth of research on software metrics. The authors use Python data analysis and visualization to find statistically significant patterns in Spearman correlation analysis.
The findings indicate that the strongest predictors of maintainability concerns are Connectivity Among Entities (ρ = 0.82) and cyclic Structure (ρ = 0.77), subsequent to WMC (ρ = 0.70) and LOC (ρ = 0.65). The smallest and insufficient predictor of flaw frequency is DIT (ρ = 0.30). Finally, this study ties the gap of theoretical academic measures and the real-world aspects of software engineering.

Keywords

Cyclomatic Complexity Lines of Code (LOC) Open-Source Software.

7

Federated Learning for Early Detection of Advanced Persistent Threats in IoT Networks

Pages: 100-108 English

Full Text Issues

Abstract

In the era of connected IoT devices, ensuring cyber security while preserving data privacy is increasingly critical. Federated learning offers a promising approach by enabling collaborative training of detection systems without sharing raw data. This paper presents a novel federated Intrusion Detection System (IDS) based on XG Boost algorithm, and for the first time designed to detect initial compromise (I.C.) phase of Advanced Persistent Threats (APTs) in distributed Internet of Things (IoT) environments. By leveraging the federated framework, the IDS achieves robust detection across multiple devices while maintaining privacy and minimizing computational overhead. Extensive simulation results indicated that our proposed method achieved a precision of 97%, recall of 100%, and F1-score of 98%, providing a practical and efficient solution for real-world IoT security challenges

Keywords

Federated Learning initial compromised Intrusion Detection System

8

Explainable Artificial Intelligence Integrated Ensemble Learning Framework for Diabetes Prediction

Pages: 101-126 English

Full Text Issues

Abstract

Accurately predicting of diabetes using clinical and demographic indicators is crucial, as early detection of this chronic metabolic disorder helps prevent serious long-term organ complications. Existing research continues to face significant challenges, including class imbalance, the scarcity of large and diverse datasets, and limited integration of explainable artificial intelligence. This research compares several ensemble learning methods including extremely randomized Trees, on a large imbalanced dataset (87,664 negative vs. 8,482 positive samples). To mitigate imbalance, we evaluate six resampling approaches including Random Over Sampling. We assess models using metrics robust to class imbalance (precision, recall, F1, AUC-ROC, and AUC-PR) and calibration measures. The Extra Trees classifier achieved the highest measured accuracy (0.994); with Random Over Sampling for balancing dataset. also, these results were compared with several previous works and number of machine learning algorithms, and the results showed superiority. Explain ability is performed at both global and local levels: permutation and SHAP for global feature importance, and (Local Interpretable Model-Agnostic Explanations) force plots for instance-level reasoning. however, we analyzed this result using sensitivity, specificity, PR-AUC and calibration, we report detailed experiments showing how resampling method, hyper parameter tuning, and stratified validation influence performance. Finally, we provide clinical-relevant insights from SHAP analyses and discuss limitations and future directions for deploying interpretable models in screening workflows.

Keywords

Diabetes Prediction Ensemble Learning Explainable Artificial Intelligence. extremely randomized Trees machine learning

Alkadhim Journal for Computer Science

Articles in This Issue

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords

Abstract

Keywords