130x Filetype PDF File size 0.37 MB Source: posl.ait.kyushu-u.ac.jp
Quantifying Programmers’ Mental Workload during ProgramComprehensionBasedonCerebralBloodFlow Measurement: AControlled Experiment Takao Nakagawa Yasutaka Kamei Hidetake Uwano Nara Institute of Science and Kyushu University Nara National College of Technology Fukuoka, Japan Technology Nara, Japan kamei@ait.kyushu- Nara, Japan takao-n@is.naist.jp u.ac.jp uwano@info.nara-k.ac.jp Akito Monden Kenichi Matsumoto Daniel M. German Nara Institute of Science and Nara Institute of Science and University of Victoria Technology Technology BC, Canada Nara, Japan Nara, Japan dmg@uvic.ca akito-m@is.naist.jp matumoto@is.naist.jp ABSTRACT measurement is difficult as it is a mental (cognitive) process Program comprehension is a fundamental activity in soft- performed inside the human brain. ware development that cannot be easily measured, as it is To measure such mental activities, recent neuroscience performed inside the human brain. Using a wearable Near and cognitive science studies try to directly measure brain Infra-red Spectroscopy (NIRS) device to measure cerebral activity using sensors such as EEG, fMRI and NIRS [1]. blood flow, this paper tries to answer the question: Can Also in the software engineering domain, Siegmund et al. the measurement of brain blood-flow quantify programmers’ [6] pointed out (at the FSE2012 New Idea Track) the neces- mental workload during program comprehension activities? sity of analysis of brain activities in program comprehension. Weperformedacontrolledexperiment with 10 subjects; 8 of TheyproposedanexperimentdesignusingfMRI(functional them showed high cerebral blood flow while understanding magnetic resonance imaging) measurement; however, no re- strongly obfuscated programs (requiring high mental work- sult has been reported so far, and research progress in this load). This suggests the possibility of using NIRS to measure area is strongly demanded. the mental workload of a person during software develop- In this paper, we focus on the measurement of program- ment activities. mers’ mental workload during program comprehension to answer the question: Can brain measurement quantify pro- Categories and Subject Descriptors grammers’ metal workload in program comprehension? Ifthe measurement could identify programmer’s very high work- D.2.5 [Software Engineering]: Testing and Debugging; load, which may imply the work is beyond his/her capacity, D.2.8 [Software Engineering]: Metrics timely help by an expert or a manager needs to be consid- General Terms ered. This paper presents an experiment design using a wear- Measurement able NIRS(NearInfra-redSpectroscopy)toobservethecere- bral blood flow of the prefrontal cortex (PFC), which has Keywords been considered to govern planning of complex cognitive be- Program comprehension, mental workload, cerebral blood haviour and decision making [7]; therefore, we believe PFC flow measurement activity is vital in program comprehension. In the experi- ment, we asked each subject to perform two tasks: 1) non 1. INTRODUCTION obfuscated C programs, and 2) strongly obfuscated C pro- Programcomprehensionisafundamentalactivityrequired grams that should require higher mental workload of PFC. in today’s software development processes such as coding, As a result of a controlled experiment with 10 graduate code review, debugging, code reuse and maintenance. Its students in computer science, 8 students showed higher cere- bral blood flow during reading of obfuscated versions. This suggeststhepossibilityofmeasuringmentalworkloadinpro- gram comprehension using NIRS, while we also came up Permission to make digital or hard copies of all or part of this work for with several improvements needed in future experiments to personal or classroom use is granted without fee provided that copies are clarify the feasibility and limitation of our approach. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 2. RELATEDWORK permission and/or a fee. To understand the human aspect of program comprehen- ˘ ICSE’14, May31âAS¸ June 7, 2014, Hyderabad, India Copyright 2014 ACM 978-1-4503-2768-8/14/05 ...$15.00. sion (such as understanding level, developers’ behaviour, Figure 1: The sample of the program text and preconditions given to subjects comprehensionstrategy), researchershaveusedindirectmea- To prepare two difficulty-level programs for each algo- surement such as interview, questionnaire or ‘think-aloud’ rithm, we use an obfuscation technique to make a ‘hard’ protocol (let subjects speak their content of thinking during version program from an ‘easy’ (non-obfuscated) version. experiment [2]). We used a loop obfuscation so that loop counters and sen- Parnin [5] analysed both short-term and long-term mem- tinel values are updated frequently and irregularly without ory retention of developers who are working in parallel pro- changing the functionality of a program [3]. Figure 1 shows grammingtasksfromtheviewpointofcognitiveneuroscience. a non-obfuscated program that seeks the minimum value in Also, Nakamura et al. [4] focused on the remembering, re- an array. calling and forgetting of variables in source code to develop a Two different level/functionality tasks are assigned to 1 modelof program comprehension. Their experiment showed subject. To reduce learning effect, half of the subjects per- that the time required to complete a comprehension task form the easy task first, and the others perform the hard one well matched the difficulty of recalling a variable. first. All subjects perform an exercise task before the main Siegmund et al. [6] applied a neuroscientific approach experimental task. The exercise task has two complexity for the program comprehension process and proposed an levels similar to the main experimental task. experiment plan for identifying cortical regions related to 3.3 Task the program comprehension in the FSE’12. They pointed out the necessity of analysis of brain activities to answer To standardize the strategy of program comprehension the question such as What distinguishes good programmers among all subjects, they read and simulate the execution from bad programmers? or What makes a good programmer? of a program using mental simulation strategy (also known However, they only mentioned about their interim report of as hand simulation). It is one of the bottom-up program progress of the experiment and its design, thus, they have comprehension strategy to simulate the program’s execu- not published results yet. Siegmund et al. predict that pre- tion process (e.g.” control flow and variable assignment). frontal cortex (related to the memory operation or complex To properly trace the program, subjects have to remember intellectual activity) will be activated when developers try the current position of loop-flow and variables name/value to understand the program. in their short-term memory during mental simulation. These studies suggest that cognitive process related to During the mental simulation, when the subjects reach to the human memory exist during program comprehension. a checkpoint marked in the program like (1) of Line 4 and However, there are no experimental results about brain ac- (2) of Line 8 in Figure 1, they write down the value of each tivation during program comprehension, or programming. variable at the checkpoint to an answer sheet. After writing Thus, little is known about how actually brain works during down these values, they raise their hand. An experimenter program comprehension tasks. (one of the authors) checks the values on their answer sheet and tell the subjects whether or not the answer is correct. 3. EXPERIMENT If their answer is correct, the subjects continue to per- form the comprehension from the current checkpoint to the 3.1 Subjects Ten students of Nara Institute of Science and Technology participated in the experiment as subjects. All subjects are male, 22-26 years old, and have experience using C-language for at least 3 years. 3.2 ProgramsandAssignment Six programs (three algorithms and two difficulty levels) of 17-32 lines of code, all written in C language, are used. 3 algorithms are searching a keyword, calculating total values, andseekingthemaximumvalueinanarray. Oneofthealgo- rithms is used in an exercise task before a main experiment task. Figure 2: WOT-200(Hitachi medical Co.) 1.00 easy hard Hb0.75 alized oxy-0.50 Norm0.25 0.00 A B C D E F G H I J Subjects Figure 3: Distribution of normalized oxy-Hb next one. If not, they go back to a previous checkpoint Subject A and restart the comprehension task. When they correctly answer the last checkpoint marked at the return statement in the program (like (3) of Line 12 in Figure 1), they have Hb completed the task. 3.4 Equipmentandenvironment alized oxy- We use the NIRS (Wearable Hikari Topography WOT- 200, made by HITACHI MEDICO). Figure 2 shows the ap- Norm pearance of the device. NIRS assumes that higher brain activity requires more oxygen to be transported by the blood flow. Therefore, to quantify the brain activity, NIRS measures the amount Time [s] of oxygenated haemoglobin (oxy-Hb) in the cerebral blood flow. Figure 4: Chronological changes of brain activation We consider this device suitable for measurement of pro- gram comprehension under the condition similar to the real environment. Because it is lightweight, can be easily set on the subject’s head, and does not keep subjects’ body in a 4. RESULTSANDDISCUSSION fixed position during an experiment in contrast to the fMRI, Figure 3 shows the distribution of normalized oxy-Hb of MEGandPETthathasafinerandwiderspatialresolution each subject/task. Labels A to J represents each subject than NIRS. (the left box shows the distribution of ‘easy’, and right one Subjects sit down during the experiment. Experiments shows the ‘hard’). The y-axis corresponds to the normalized are performed in a quiet room where only the subject and oxy-Hb (i.e., how much the brain works actively). the experimenter are. We found that the normalized oxy-Hb of hard tasks is To avoid the noise in the measurements, an experimenter larger than easy tasks among all subjects except E and G. (one of the authors) asked the subjects not to lower and raise This result suggests that the complexity of the program in- their head. The subjects adjust the position and the height duces the activation of the prefrontal cortex, thus, we con- of the chair before beginning of the experiment. Program sider that mental workload could be quantified using cere- text and an answer sheet are put in front of the subjects. bral blood flow measurement. Another finding is that the variance of normalized oxy-Hb 3.5 Metrics of hard tasks is larger than easy tasks among all subjects Since the amount of oxy-Hb can be measured only as a except E. This suggests that even in a hard task, mental relative value from the beginning of the measurement, we workload is often very low. Figure 4 shows the time-course used a normalized value based on the following equation: changes of subject A’s data during performing ‘hard’ task, which indicates the amount of oxy-Hb continues to change oxyHb−min(s) throughout the experiment. Therefore, additional measure- Normalized oxyHb = max(s)−min(s) ment, such as PC operation history and eye-gaze tracking, is needed in future study to observe subjects’ external be- where max(s) and min(s) are the maximum and minimum haviours. value through all tasks of each subject s. The range of the The result also indicates that some subjects (E and G in normalized oxy-Hb is [0,1]. We measured the normalized our case) may show the counter-trend tendency to others. oxy-Hb every 200ms. This could happen by several reasons, e.g., 1) measurement 1.00 0.80 Hb0.75 Hb0.60 alized oxy-0.50 alized oxy-0.40 Norm0.25 Norm 0.20 0.00 E G Subjects 0.00 Figure 5: Normalized oxy-Hb in an exercise task A B C D E F G H I J Subjects error (the sensor may not fit well to some subjects’ fore- Figure 6: Chronological changes of brain activation head), 2) subject’s skill (high skill subjects may not feel any difficulty in hard tasks), 3) subject’s natural property (some possibility of measuring mental workload by oxy-Hb. subjects may not require high oxy-Hb in mental simulation), In the future, we are planning to compare program com- etc. prehension tasks with other cognitive tasks such as reading For further analysis, Figure 5 shows the distribution of a natural language text or doing a mathematical calculation. normalized oxy-Hb of E and G in the exercise tasks. Inter- Also, we are planning to conduct very-easy/very-hard tasks estingly, subject E showed the same counter-trend reaction as baseline tasks, e.g., doing (nothing) with eye-closed as (oxy-Hbduring‘easy’higherthan‘hard’)intheexercisetask very-easy, and doing extremely-difficult mathematical cal- (Figure 5, left graph). Further experiments are required in culation as very-hard. We also plan to use other measure- future to analyse why and how often this would happen. At ment sources, e.g., history of PC operations, eye-tracking least, an interview to subjects after the experiment is needed and interview to subjects. to clarify if all subjects felt the ‘hard’ task more difficult than the ‘easy’ task. 6. REFERENCES Figure 6 shows the result of time-series analysis. We equally divided the task completion time into three parts, [1] R. Cabeza and L. Nyberg. Imaging cognition II: An the early stage, the middle stage, and the final stage. Each empirical review of 275 PET and fMRI studies. Journal bar in Figure 6 shows the median of the normalized oxy-Hb of cognitive neuroscience, 12(1):1–47, 2000. of each stage. [2] K. A. Ericsson and H. A. Simon. Verbal reports as Figure 6 indicates that normalized oxy-Hb is higher in data. Psychological review, 87(3):215, 1980. the middle stage than the early stage (8 out of 10 subjects) [3] A. Monden, Y. Takada, and K. Torii. Method for and higher in the middle stage than the final stage (7 out scrambling programs containing loops. IEICE Trans. of 10 subjects). This may happened because most wrong on Information and Systems, 80(7):644–652, 1997. (in answers occurred in the middle stage, which implies that Japanese). high workload is required to correct the answers. This result [4] M. Nakamura, A. Monden, T. Itoh, K. Matsumoto, suggests the possibility to quantify the time-course change Y. Kanzaki, and H. Satoh. Queue-based cost evaluation of mental workload using NIRS. of mental simulation process in program Threats to validity: To generalize our result, we need to comprehension. In Proc. of 9th IEEE International consider top-down comprehension strategy and its difficulty Software Metrics Symposium (METRICS’03), pages because our experiment lets subjects use only bottom-up 351–360, 2003. strategy (i.e., mental simulation). However, we believe that [5] C. Parnin. A cognitive neuroscience perspective on our method can be applied to another strategy if difficulty memory for programming tasks. In Proc. of 22nd levels of the program are well defined, because PFC has a Annual Meeting of the Psychology of Programming strong relation with complicated intellectual activity and the Interest Group (PPIG), 2010. top-down strategy is as complicated as bottom-up strategy. [6] J. Siegmund, A. Brechmann, S. Apel, C. Kastner, ¨ J. Liebig, T. Leich, and G. Saake. Toward measuring 5. CONCLUSIONS program comprehension with functional magnetic In this paper, we aimed to investigate whether or not de- resonance imaging. In Proc. of the ACM SIGSOFT velopers’ workload can be quantified using cerebral blood 20th International Symposium on the Foundations of flow measurement of the prefrontal cortex. In our experi- Software Engineering, (FSE ’12), pages 24:1–24:4, 2012. ment, we measured the amount of oxygenated haemoglobin [7] Y. Yang and A. Raine. Prefrontal structural and (oxy-Hb)duringcomprehensionoftwodifferenttypesofpro- functional brain imaging findings in antisocial, violent, grams, ‘hard’ (high complexity) and ‘easy’ (low complexity). and psychopathic individuals: A meta-analysis. Theresult showed the tendency that oxy-Hb becomes higher Psychiatry Research: Neuroimaging, 174(2):81 – 88, in ‘hard’ programs than ‘easy’ programs, which suggests the 2009.
no reviews yet
Please Login to review.