Split-Paper Testing: A Novel Approach to Evaluate Programming Performance

Journal of Information Processing,Vol. 28
「Split-Paper Testing: A Novel Approach to Evaluate Programming Performance」

Yasuichi Nakayama, Yasushi Kuno and Hiroyasu Kakuda
Split-Paper Testing: A Novel Approach to Evaluate Programming Performance
There is a great need to evaluate and/or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is difficult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shuffles the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have affinity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically significant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.
Journal of Information Processing,Vol. 28, pp.733-743 (2020).