Melike Geçer

Master’s Degree Thesis: Debugging Spark Applications

Debugging is the key to identify errors in a software program. By which, important problems can be detected and avoided beforehand. Spark is an engine used to run analysis on large-scale data. Debugging Spark applications is especially significant as any tool, apart from log files, is not provided by Spark. However, an application may produce a lengthy log file which is challenging to examine.

In this research, I have investigated the methods Spark developers use to debug their applications. The problem was introduced by Haidar Osman, who is one of the supervisors of the thesis, as an important problem in the industry. I have searched for the top frequently asked questions about Spark and reproduced 5 error logs of 4 different kinds of exceptions. A series of interviews were held with Spark developers. Furthermore, their methodologies were studied to find a pattern of how they track down an error log.