当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect Prediction
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2024-03-25 , DOI: 10.1109/tse.2024.3381235
Haonan Tong 1 , Dalin Zhang 1 , Jiqiang Liu 1 , Weiwei Xing 1 , Lingyun Lu 1 , Wei Lu 1 , Yumei Wu 2
Affiliation  

Multi-source cross-project defect prediction (MSCPDP) attempts to transfer defect knowledge learned from multiple source projects to the target project. MSCPDP has drawn increasing attention from academic and industry communities owing to its advantages compared with single-source cross-project defect prediction (SSCPDP). However, two main problems, which are how to effectively extract the transferable knowledge from each source dataset and how to measure the amount of knowledge transferred from each source dataset to the target dataset, seriously restrict the performance of existing MSCPDP models. In this paper, we propose a novel m ulti-source tr a n s fer weigh t ed e nsemble lea r ning (MASTER) method for MSCPDP. MASTER measures the weight of each source dataset based on feature importance and distribution difference and then extracts the transferable knowledge based on the proposed feature-weighted transfer learning algorithm. Experiments are performed on 30 software projects. We compare MASTER with the latest state-of-the-art MSCPDP methods with statistical test in terms of famous effort-unaware measures (i.e., PD, PF, AUC, and MCC) and two widely used effort-aware measures ( $P_{opt}20\%$ and IFA). The experiment results show that: 1) MASTER can substantially improve the prediction performance compared with the baselines, e.g., an improvement of at least 49.1% in MCC, 48.1% in IFA; 2) MASTER significantly outperforms each baseline on most datasets in terms of AUC, MCC, $P_{opt}20\%$ and IFA; 3) MSCPDP model significantly performs better than the mean case of SSCPDP model on most datasets and even outperforms the best case of SSCPDP on some datasets. It can be concluded that 1) it is very necessary to conduct MSCPDP, and 2) the proposed MASTER is a more promising alternative for MSCPDP.

中文翻译:


硕士:用于多源跨项目缺陷预测的多源传输加权集成学习



多源跨项目缺陷预测(MSCPDP)试图将从多个源项目学到的缺陷知识转移到目标项目。 MSCPDP由于其相对于单源跨项目缺陷预测(SSCPDP)的优势越来越受到学术界和工业界的关注。然而,如何有效地从每个源数据集中提取可迁移知识以及如何衡量从每个源数据集到目标数据集的知识迁移量这两个主要问题严重限制了现有MSCPDP模型的性能。在本文中,我们提出了一种新颖的 MSCPDP 多源传输加权集成学习(MASTER)方法。 MASTER根据特征重要性和分布差异测量每个源数据集的权重,然后基于所提出的特征加权迁移学习算法提取可迁移知识。对 30 个软件项目进行了实验。我们将 MASTER 与最新最先进的 MSCPDP 方法进行比较,并根据著名的努力无意识测量(即 PD、PF、AUC 和 MCC)和两种广泛使用的努力意识测量($P_{选择}20\%$ 和 IFA)。实验结果表明:1)MASTER与基线相比可以大幅提高预测性能,例如MCC至少提高49.1%,IFA至少提高48.1%; 2) MASTER 在 AUC、MCC、$P_{opt}20\%$ 和 IFA 方面显着优于大多数数据集上的每个基线; 3)MSCPDP模型在大多数数据集上明显优于SSCPDP模型的平均情况,甚至在某些数据集上优于SSCPDP模型的最佳情况。 可以得出的结论是:1)进行MSCPDP是非常有必要的,2)所提出的MASTER是MSCPDP更有前途的替代方案。
更新日期:2024-03-25
down
wechat
bug