Debugging big data analytics in Data-Intensive Scalable Computing (DISC) systems is a time-consuming effort. Today’s DISC systems offer very little tooling for debugging and, as a result, programmers spend countless hours analyzing log files and performing trial and error debugging. To aid this effort, UCLA developed BigDebug, an interactive debugging tool and automated fault localization service to help Apache Spark developers in debugging big data analytics.
To emulate interactive step-wise debugging without reducing throughput, BigDebug provides simulated breakpoints that enable a user to inspect a program without actually pausing the entire distributed computation. It also supports on-demand watchpoints that enable a user to retrieve intermediate data using a guard predicate and transfer the selected data on demand. To understand the flow of individual records within a pipeline of RDD transformations, BigDebug provides data provenance capability, which can help understand how errors propagate through data processing steps. To support efficient trial-and-error debugging, BigDebug enables users to change program logic in response to an error at runtime through a realtime code fix feature, and selectively replay the execution from that step. Finally, BigDebug proposes an automated fault localization service that leverages all the above features together to isolate failure-inducing inputs, diagnose the root cause of an error, and resume the workflow for only affected data and code.
The BigDebug system should contribute to improving Spark developerproductivity and the correctness of their Big Data applications. This big data debugging effort is led by UCLA Professors Miryung Kim and Tyson Condie, and produced several research papers in top Software Engineering and Database conferences. The current version of BigDebug is publicly available at https://sites.google.com/site/sparkbigdebug/.
17. Feature 5: Automated Fault Localization
Goal: Given a test function and set of failing results
◦ Identify the minimum set of input records that can reproduce the failure
◦ We apply data provenance and delta debugging in tandem
After all is said
and done more is
said than done
than done
done more is
said
After all is
said and
After
all
is
said
and
done
more
is
said
than
done
(After,1)
(all,1)
(is,1)
(said,1)
(and,1)
(done,1)
(more,1)
(is,1)
(said,1)
(than,1)
(done,1)
(After,1)
(all,1)
(is,2)
(and,1)
(more,1)
(said,2)
(than,1)
(done,2)
After,1
all, 1
is, 2
and, 1
more, 1
said, 2
done, 2
than, 1
FlatMap Map ReduceByKey CollectTextFile
Task 1
Task 3
Task 1
Task 3
Task 2Task 2
Stage 2Stage 1