A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

TitleA Public Bug Database of GitHub Projects and Its Application in Bug Prediction
Publication TypeConference Paper
Year of Publication2016
AuthorsTóth Z, Gyimesi P, Ferenc R
Conference NameProceedings of the 16th International Conference on Computational Science and Its Applications (ICCSA 2016)
PublisherSpringer International Publishing
Conference LocationBeijing, China
KeywordsBug database, Bug prediction

Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data. In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100%).

Page last modified: January 23, 2018