Abstract
Among different techniques, algorithms and applications of Data Mining, predicting the class
label of unlabeled objects(undefined class label) is a crucial term in the field. The most common
approaches in this area is the use of classification technique (DT, Bayes, SVM, KNN and others)
that represent what is known as supervised learning. However, in many cases no target class
labels and the boundaries are available to perform the prediction, so the new approach
Clustering-classification technique is used.
The work in this paper presents a survey of the most common researches conducted in this field
and discuss their experiments, the algorithms they used, the types of data they utilized, the data
sizes used, and the results they discovered.
According to the results, applying the clustering techniques before classification improved
classification accuracy and reduced experiment execution time. The Cluster Classifier was
proven to be a suitable approach to summarize data by some of the researchers. It achieves a
summarization rate of over 50%, which represents a considerable reduction in the size of the test
datasets .
The findings of the researches indicated that, in addition to feature selection and feature
extraction, data preprocessing (handled missing data and effective outlier detection techniques)
enhanced the classifier performance and accuracy while reducing the classification error.
label of unlabeled objects(undefined class label) is a crucial term in the field. The most common
approaches in this area is the use of classification technique (DT, Bayes, SVM, KNN and others)
that represent what is known as supervised learning. However, in many cases no target class
labels and the boundaries are available to perform the prediction, so the new approach
Clustering-classification technique is used.
The work in this paper presents a survey of the most common researches conducted in this field
and discuss their experiments, the algorithms they used, the types of data they utilized, the data
sizes used, and the results they discovered.
According to the results, applying the clustering techniques before classification improved
classification accuracy and reduced experiment execution time. The Cluster Classifier was
proven to be a suitable approach to summarize data by some of the researchers. It achieves a
summarization rate of over 50%, which represents a considerable reduction in the size of the test
datasets .
The findings of the researches indicated that, in addition to feature selection and feature
extraction, data preprocessing (handled missing data and effective outlier detection techniques)
enhanced the classifier performance and accuracy while reducing the classification error.
Keywords
classification
Cluster-Classifier
clustering
Data mining