当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic Debugging of Design Faults in MapReduce Applications
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2024-02-26 , DOI: 10.1109/tse.2024.3369766
Jesús Morán 1 , Antonia Bertolino 2 , Claudio de la Riva 1 , Javier Tuya 1
Affiliation  

Among the current technologies to analyse large data, the MapReduce processing model stands out in Big Data. MapReduce is implemented in frameworks such as Hadoop, Spark or Flink that are able to manage the program executions according to the resources available at runtime. The developer should design the program in order to support all possible non-deterministic executions. However, the program may fail due to a design fault. Debugging these kinds of faults is difficult because the data are executed non-deterministically in parallel and the fault is not caused directly by the code, but by its design. This paper presents a framework called MRDebug which includes two debugging techniques focused on the MapReduce design faults. A spectrum-based fault localization technique locates the root cause of these faults analysing several executions of the test case, and a Delta Debugging technique isolates the data relevant to trigger the failure. An empirical evaluation with 13 programs shows that MRDebug is effective in debugging the faults, especially when the localization is done with the reduced data. In summary, MRDebug automatically provides valuable information to understand MapReduce design faults as it helps locate their root cause and obtains a minimal data that triggers the failure.

中文翻译:

MapReduce应用中设计错误的自动调试

在当前分析大数据的技术中,MapReduce处理模型在大数据中脱颖而出。 MapReduce 在 Hadoop、Spark 或 Flink 等框架中实现,这些框架能够根据运行时可用的资源来管理程序执行。开发人员应该设计程序以支持所有可能的非确定性执行。然而,该程序可能会因设计错误而失败。调试此类故障很困难,因为数据是非确定性并行执行的,并且故障不是由代码直接引起的,而是由其设计引起的。本文提出了一个名为 MRDebug 的框架,其中包括两种针对 MapReduce 设计错误的调试技术。基于频谱的故障定位技术通过分析测试用例的多次执行来定位这些故障的根本原因,而增量调试技术则隔离与触发故障相关的数据。对13个程序的实证评估表明,MRDebug在调试故障方面是有效的,特别是在使用减少的数据进行定位时。综上所述,MRDebug自动提供了有价值的信息来理解MapReduce 设计错误,因为它有助于定位其根本原因并获取触发故障的最少数据。
更新日期:2024-02-26
down
wechat
bug