Abstract
Heart disease is a common disease that causes death and is difficult to detect manually. A more efficient classification model that relies on machine learning methods to achieve higher classification accuracy, attracts the attention of researchers to design an effective prediction model. Moreover, it plays an important role in the practical application of medical cardiology with the aim of early detection of heart diseases. In this paper, an efficient and accurate heart disease detection system is proposed based on the proposed adaptive feature selection technique using four machine learning methods: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF). Two feature selection methods were used to design the proposed technique, mutual information (MI) and recursive feature elimination (RFE) to determine the optimal number of selected features that increase the performance of the classification models and reduce the time complexity of model implementation. The proposed technique was implemented on the two standard databases from the UCI machine learning repository: Cleveland heart disease and heart Statlog Cleveland. The best model was selected and saved as a prediction model using the cross-validation method. The results show that each data has a different number of features chosen according to the classifier model. For the first heart disease dataset, the best heart disease detection system Support Vector Machine-mutual information (SVM-MI) achieved the highest classification accuracy of approximately 96.755 compared to the other classifier models used. While the Random Forest-mutual information (RF-MI) model achieved an accuracy of 97.4% for the second data set. The proposed technique produced the highest prediction performance in terms of accuracy, f1 score, accuracy, and metric retrieval compared to the latest research in this field.