DebtHunter: A Machine Learning-based Approach for Detecting Self-Admitted Technical Debt


Due to limited time, budget or resources, a team is prone to introduce code that does not follow the best software development practices. This code that introduces instability in the software projects is known as Technical Debt (TD). Often, TD intentionally manifests in source code, which is known as Self-Admitted Technical Debt (SATD). This paper presents DebtHunter, a natural language processing (NLP)- and machine learning (ML)- based approach for identifying and classifying SATD in source code comments. The proposed classification approach combines two classification phases for differentiating between the multiple debt types. Evaluations over 10 open source systems, containing more than 259k comments, showed that the approach was able to improve the performance of others in the literature. The presented approach is supported by a tool that can help developers to effectively manage SATD. The tool complements the analysis over Java source code by allowing developers to also examine the associated issue tracker. DebtHunter can be used in a continuous evolution environment to monitor the development process and make developers aware of how and where SATD is introduced, thus helping them to manage and resolve it.

EASE 2021: Evaluation and Assessment in Software Engineering June 2021 Pages 278–28