Abstract
The most accessible radiological technique for thoracic disease detection and
diagnosis uses chest X-ray imaging as its primary method. The deployment of automated chest Xray
analysis systems faces two main obstacles because of untrustworthy labels in large datasets
and unpredictable predictive confidence levels. The research proposes a hybrid system which
combines Vision Transformer (ViT) architecture with methods to handle noisy labels and produce
accurate probability estimates for multiple disease diagnosis in chest X-ray images. The system
trains on CheXpert and NIH ChestX-ray14 datasets while using Co-Teaching and DivideMix
noise-handling methods and self-supervised pretraining to enhance feature resistance against
supervision errors. The framework uses temperature scaling and Monte Carlo dropout as post-hoc
methods to enhance confidence reliability without compromising discriminative performance. The
system aims to reach performance levels that match or exceed traditional CNN and standard ViT
models in AUROC and mAP and F1 score metrics. The system reduces the effects of
untrustworthy labels while generating meaningful confidence scores which doctors can
understand. The model produces Grad-CAM++ explanations to assist doctors in understanding its
decision-making process. The hybrid system works to develop AI systems that deliver both
exact results and safe operational readiness for real-world chest X-ray decision support systems.
diagnosis uses chest X-ray imaging as its primary method. The deployment of automated chest Xray
analysis systems faces two main obstacles because of untrustworthy labels in large datasets
and unpredictable predictive confidence levels. The research proposes a hybrid system which
combines Vision Transformer (ViT) architecture with methods to handle noisy labels and produce
accurate probability estimates for multiple disease diagnosis in chest X-ray images. The system
trains on CheXpert and NIH ChestX-ray14 datasets while using Co-Teaching and DivideMix
noise-handling methods and self-supervised pretraining to enhance feature resistance against
supervision errors. The framework uses temperature scaling and Monte Carlo dropout as post-hoc
methods to enhance confidence reliability without compromising discriminative performance. The
system aims to reach performance levels that match or exceed traditional CNN and standard ViT
models in AUROC and mAP and F1 score metrics. The system reduces the effects of
untrustworthy labels while generating meaningful confidence scores which doctors can
understand. The model produces Grad-CAM++ explanations to assist doctors in understanding its
decision-making process. The hybrid system works to develop AI systems that deliver both
exact results and safe operational readiness for real-world chest X-ray decision support systems.
Keywords
Chest x-ray
label noise
medical image analysis.
multi-disease diagnosis
uncertainty calibration
Vision Transformer