Abstract
This article discusses a set of modern approaches and techniques in the field of cyber security and big data analysis using machine learning methods, with the aim of improving the ability to detect attacks and mitigate their impact. With the advancement of technology and the world entering a new phase of digital transformation, data and information are increasing in large, uncontrolled quantities, and are vulnerable to corruption and loss. It has become impossible to rely on old methods to preserve the enormous volume of data. The methodology relies on data analysis through sequential stages based on segmentation, classification, and monitoring of network anomalies. It also reduces the size of redundant features to speed up the process of analyzing big data, focusing on attempts to hack operating systems and security systems. Principal Component Analysis (PCA) was applied to reduce the data dimensions and identify influential features, facilitating the training of classifiers. The experiment included the use of multiple binary classifiers, including k-Nearest Neighbours (k-NN), Support Vector Machines (SVM), Bayesian algorithms, and artificial neural networks (ANNs), due to their high efficiency in monitoring network disturbances. The classifiers were combined using two methods: soft voting and majority voting, to achieve higher performance and better detection accuracy. Two data processing methods were applied: the first was to divide the data into subsets, each with its own processing path, and the second was to use sensors that collect and analyse user data via parallel paths to detect anomalies. An experiment was conducted on Internet of Things (IoT) device data, where the combined classifiers (such Support Vector (SV), Weighted Voting (WV), and Majority Voting (MV)) demonstrated higher performance than the individual classifiers, with the static SV classifier achieving a 2.5% increase in classification accuracy (ACC) compared to the best baseline classifier. The results confirm that systematically combining classifiers enhances the effectiveness of cyber detection systems in large and complex data environments.
Keywords
Analysis
Big data
Cyber Security
Machine Learning.