The Effectiveness of Resampling Method for Handling Class Imbalances in Software Defect Prediction

Defect prediction is crucial for software products to be high-quality and reliable. Class imbalance, however, in which one class does much better than the other, poses a significant challenge to flaw prediction models. This inequality often results in discriminatory behavior towards the majority class, resulting in poor performance in identifying the defects of the minority class. By undersampling the dominant class, oversampling the minority class, or combining the two, resampling entails changing the distribution of the dataset. This study aims to develop a robust and accurate model that can overcome the limitations of class imbalance and improve overall crash prediction performance. Logistic regression, a widely used classification algorithm, offers interpretability and flexibility, making it suitable for defect prediction. This research investigates the effectiveness of the resampling technique in conjunction with logistic regression to deal with the class imbalance in defect prediction software. Accuracy and UAC measurement result from the t-test for 12 MDP datasets show that Logistic Regression+Sample (Bootstrapping) works much better than Logistic Regression, with an average accuracy of 90.78% and an average AUC of 0.81.

Author Frieyadie; Munaisyah Abdullah; Foni Agus Setiawan
Conference Type Scopus indexed Reputable International Conferences
Published in 2023 International Conference on Information Technology Research and Innovation (ICITRI)
Quartile Rank Non-Q
DOI 10.1109/ICITRI59340.2023.10249255

Publisher IEEE
Pages 22-27
Semester Ganjil 2023/2024
Download/Mirror Download