Sunteți pe pagina 1din 5

Towards a Framework for Mining Students

Programming Assignments

Ella Albrecht, Jens Grabowski


Institute of Computer Science
University of Gttingen
Gttingen, Germany
ella.albrecht@cs.uni-goettingen.de

Abstract Due to an increasing number of students, more for the assessment of Java programs, offers feedback in form of
and more learning institutions tend to use computer-supported run time traces [4] and graph representations of data structures
learning tools like online learning platforms or intelligent [5].
tutoring systems. This has opened up the opportunity to collect a
huge amount of students data. Educational Data Mining (EDM) To investigate common student programming errors,
uses mining techniques to derive information from these data various studies were carried out. They differ in the investigated
about students knowledge, behavior and experience to improve programming language and in the kind of information about
education. In this paper, we present a framework for mining mistakes and how it was obtained. One of the first studies on
programming errors of computer science students by analyzing first-year students mistakes [6] used goal/plan analysis to find
the students solutions to a programming assignment. The logical errors in Pascal programs and to suggest possible
framework serves as both, a computer aided assessment tool as misconceptions the students may have had. Further attempts to
well as an immediate feedback tool about the learning progress of gather common programming errors were the evaluation of
the students for the educator. compiler error logs [7][9], observations by the teaching staff
during class [10], and interviews and surveys with students and
Keywordscomputer aided assessment; educational data educators [7], [11].
mining; programming errors
CAA has opened the possibility to collect a huge amount of
I. INTRODUCTION students data, this opportunity is not fully exploited. Most
Assessment is a useful mean to provide feedback to CAA tools just assess a students solution and provide some
students as well as to educators about the learning progress. kind of feedback. Data gained during the assessment process,
However, providing qualitative feedback is a time consuming e.g., compiler logs, traces, or code metrics, can be used to
task. Especially the assessment of programming exercises is detect which students require support or predict their
exceedingly hard, since the number of all possible solutions is performance. However, the generated data are often not
unlimited and student numbers increase. Often programming processed and thus remain unused [12].
exercises are assessed by using black box tests which compare In this paper, we present a framework which serves as a
the students solution outputs with the expected output to CAA tool for programming assignments. In contrast to many
determine whether the solutions are correct. The disadvantage existent CAA tools, it exploits the data generated during the
of this practice is that black box tests do not deliver further analysis of the students solutions to draw conclusions about
information about the errors made. So either the students get no students programming behavior and skills. The paper is
concrete information about their faults at all, or the teaching organized as follows. In Section 2, we introduce our research
staff has to go through the solutions manually to provide project and the basic structure of the project. The particular
valuable feedback. phases and the current status of the project are then discussed
In the last years, many institutes register an increasing in Section 3. Section 4 deals with the implementation details of
number of beginning students. Way above 200 students have a prototype of the framework for the analysis of students C
enrolled in computer science in the winter semester 2015/16 at programs.
the University of Gttingen. That is about 65% more than the II. PROJECT AND FRAMEWORK STRUCTURE
year before. This hampers a manual assessment. To cope with
the amount of students, a variety of automatic assessment tools In our project, we are developing a framework that aims at
was developed [1]. collecting and processing data about students solutions to a
programming assignment. It serves as both, a computer aided
Recent computer aided assessment (CAA) tools like, e.g., assessment tool as well as a mining platform which provides
ASSYST [2], are not only able to check the correctness of a immediate feedback about the learning progress of the students
program, but also its complexity, efficiency and style. to the educator. The collected data can be examined to make
Furthermore, the tools try to give students enhanced feedback conclusions about students programming behavior and skills.
to their mistakes. For example JACK [3], a web-based system The data are composed of the students programming errors, an

978-1-4673-8633-3/16/$31.00 2016 IEEE 10-13 April 2016, Abu Dhabi, UAE


2016 IEEE Global Engineering Education Conference (EDUCON)
Page 1096
evaluation of the programming style, and several software A. Phase 1 Basic Static and Dynamic Analysis
metrics. In the first phase, we use conventional program analysis
A. Programming Errors methods to study students solutions. For the identification of
syntax and semantic errors, the results of different compilers
The core element of the framework is the detection, and static analysis tools are parsed. The found errors are then
localization, and classification of programming errors. There assigned to the corresponding error types. We collect errors as
exist multiple classification schemes for errors [13]. In our well as warnings, since a warning may give a hint on a
work, we divide the errors into syntax, semantic and logic semantic error. For example, for the program in Fig. 2 a
errors. Syntax errors are errors due to the violation of compiler may return the warning message:
grammatical rules of the programming language, e.g., a
s5.c:13:10: warning: if statement has empty body [-
missing semicolon or omitted or misplaced braces. All syntax Wempty-body]
errors can be detected by a compiler. When the compiler finds if (i>0);
an error, it returns an error message with the reason for the ^
error and the line number at which the error is located. The message signifies a semicolon directly after the if-
A semantic error occurs if programming constructs are used condition in line 10, and therefore the if-statement has an
wrong even if the program may be syntactically correct. We empty body. Most often, it is not the intention of programmers
can distinguish between static and dynamic semantic errors. to close the statement in such a case, but rather they intended to
Static semantic errors are semantic errors that are detected by a include the subsequent instructions in the block of the if-
compiler, e.g., uninitialized variables or type mismatches. statement.
Dynamic semantic errors (or runtime errors) are errors which When students start to program, they often only focus on
occur during the execution of a program, e.g., index out of the functionality of the program. Missing indentation or odd
bounds or null pointer exception. Some dynamic semantic variable names make it hard to read and understand the code in
errors can be detected before execution by static analysis tools. most cases. It is important that students learn to write not only
Syntax as well as semantic errors lead to a non-execution or an correct but also readable programming code. Programming
interruption of the execution of a program. style is composed of two aspects, the compliance of special
Logic errors cause that a program is not doing what it is code conventions and the personal style of the programmer,
intended to do. This error type is independent of the e.g., preferred programming structures, length of variable
programming language and occurs when the programmer names or lines, number of variables, and usage of blank lines.
makes a mistake in the design of his program or has Several style checking tools [14][16] exist to automatically
implemented an algorithm incorrectly. Typical logic errors are, validate whether a code complies the agreed conventions. For
e.g., off-by-one errors or unregarded cases. Logic errors are the investigation of the students personal programming style,
most difficult to find and can only be revealed by testing. we collect various code metrics like, e.g., the ones used in [17]
and [18]. We collect code metrics for the analysis of the
B. Basic Structure of the Framework complexity of the students solutions and for the calculation of
Fig. 1 gives an overview of the basic structure of the the similarity as part of our static analysis.
framework. At first, the students solutions are analyzed The steps after the static analysis are only executed for
statically by using compilers and other static analysis tools to
solutions that were compiled successfully, since the subsequent
detect syntax and semantic errors. Those solutions that are
steps require execution of the students solution. To check
compiled successfully are then analyzed dynamically by using
whether a solution is correct, we use black box or unit tests,
unit or black box tests. These tests determine which of the
depending on whether students had to implement a whole
solutions are correct. Afterwards, the compilable solutions are
program or just a function. These tests consist of different test
clustered by similarity where each cluster has to contain at least
cases, defining input parameters and expected output of the
one correct solution. The correct solutions are used to compare
program resp. the function. Furthermore, we collect
them with the students solutions in the same cluster to localize
information about the code coverage for each test case and
logic errors. Errors found during static analysis and the ones
measure the execution time.
localized during program comparison are classified into error
types and stored in a database. Results from the dynamic Feedback regarding the mistakes students made in their
analysis are also stored in the database, such that the data can solutions is crucial to improve their knowledge. But also for the
be used to provide enhanced feedback to students as well as to educators it is helpful to know with which problems their
educators. students are struggling. Therefore, our framework provides
feedback to students as well as to educators. In the first project
III. PROJECT PHASES phase, the feedback that is given to the students and the
For our project, we have chosen an incremental educators consists of raw results of the different tools used to
development process which proceeds in three phases. In Fig. 1, evaluate the solution resp. simple statistics about the errors
parts of the framework which will be added not until the made. To draw conclusions from the made errors, we need to
second phase of our project are shaded in gray. Dashed objects classify them. Since the same errors can be detected with
are parts which are worked on in the first as well as in the different tools, a mapping between specific errors of the
second phase. Since the third phase only involves evaluation, it particular tools to defined error types is needed, so that the
is not depicted in the figure. same mistakes are not counted multiple times. For example, for

978-1-4673-8633-3/16/$31.00 2016 IEEE 10-13 April 2016, Abu Dhabi, UAE


2016 IEEE Global Engineering Education Conference (EDUCON)
Page 1097
the program from Fig. 2 a compiler as well as a static analysis the higher is the probability to detect all logic errors. However,
tool may state that the array in line 8 is out of bounds. comparison of programs to detect logic errors is time-
consuming, since, e.g., variables can be named different and
The presented methods are used in most assessment tools have to be matched, control flow graphs, abstract syntax trees
and have been demonstrated to be sufficient to assess or traces have to be compared. To reduce this effort, we cluster
programming assignments in a simple way. At the end of the the solutions by similarity, because it is easier and more
first phase which will be in February/March 2016, the probable to localize logic errors in programs which are similar.
framework will be used the first time in an introductory C Since we want to compare a students solution with a correct
course at the University of Gttingen. Participants of the course solution, each cluster needs to contain at least one correct
are first-year students in computer science, mathematics, solution. After the clustering, we compare each solution within
physics, biology and economics. Data collected during this a cluster with the correct solutions in the cluster.
course is used to enhance the framework. This extension will
be done in the second phase of the project. The investigation of convenient similarity metrics and
program comparison techniques will be part of the second
B. Phase 2 Localization of Logic Errors phase of our project. Moreover, we will study the most
Since black box tests can only tell whether a programs common errors made by the students to improve the feedback.
output is incorrect but not why, we extend our framework by a Students shall not only get raw results, but also hints why the
deeper analysis which aims at the localization of logic errors in error appears and how it could be solved. For educators the
the second phase of the project. feedback is enhanced, such that they can also see the evolution
The detection of logic errors requires knowledge about of the different error types and if prior misunderstandings of
what the program is intended to do. The easiest way for an particular programming concepts could be managed
educator to describe the intention of a program is to provide a successfully. The second phase is planned to last from April
sample solution. In general, the problem whether two programs 2016 till March 2017 and will be completed with a further
are semantically equivalent, i.e., have the same intention and application in the introductory C course.
behavior, is undecidable [19]. But as student programs are C. Phase 3 Evaluation
often very simple and have not that many variations, The third phase deals with the evaluation of the framework.
comparison to a correct solution can reveal logic errors. We analyze the precision and the miss rate of the framework as
The more correct solutions for comparison are available, well as its impact on the added value for the students and for

feedback

student solutions compilation ok? black box tests

U
static analysis

students x compilation
D
x

x
static code
analysis
style
D
x code metrics U
D
error classification
results

exercise
error localization

educator
clustering error localization
sample
solution

error localization

feedback
Fig. 1. Basic structure of the framework

978-1-4673-8633-3/16/$31.00 2016 IEEE 10-13 April 2016, Abu Dhabi, UAE


2016 IEEE Global Engineering Education Conference (EDUCON)
Page 1098
1: #include <stdio.h>
2: #include <limits.h>
3:
4: int main(void) {
5: char c = 'A';
6: int i;
7: int arr[5];
8: int a = arr[-4];
9: char *f (const char *s);
10: if (i>0);
11: while (c <= 'z') {
12: printf("%c", c);
13: c = c + 1;
14: }
15: return 0;
16: }

Fig. 2. Example for a students program that aims at printing out all
characters between A and z

the educator. Furthermore, we will use the gained data to


investigate programming beginners change in experience
concerning programming style and the amount and types of
errors made.
D. Current status of the Project
At the moment, we are in the first phase of our project. We
have implemented a prototype of the framework. Several tools
were integrated to perform a basic analysis of the students Fig. 3. Preference window for a new evaluation
solutions. Details on the implementation are given in Section
IV. Furthermore, we have prepared the exercises and the For compilation we use gcc [20] and clang [21]. We use
required test files for black box tests for the C course. In a two different compilers because we have observed that they
preliminary study, students have solved the exercise and tested return different results regarding semantic checks and
their programs using the provided test files. The outcome was warnings. As further static analysis tools we have integrated
that the exercises can be solved in an adequate time by splint [22] which is a tool for checking C programs for security
programming beginners. In a few exercises, some students had vulnerabilities and coding mistakes. The drawback of splint is
problems to understand what is actually demanded, so we its high false positive rate. As a second static code analysis
adjusted the formulation of the task accordingly. Furthermore, tool, we integrated cppcheck [23] which does not detect as
we corrected some minor errors in the test files. many errors as splint, but has a lower false positive rate. To
check whether code conventions are complied, we use vera++
IV. IMPLEMENTATION [24]. It provides a set of rules for checking the style of the
We have implemented a prototype of the framework for the program, but it also offers the possibility to formulate own
analysis of students C programs. Fig. 3 shows the preference rules in Tcl (Tool Command Language). The output of these
window for a new evaluation. The educators first need to tools is written in text or XML files. We parse those files and
choose a programming language, before available options are lookup for the corresponding error types in the database. The
shown. At the moment only C is support as language, but it is identification of the error types depends on the tools, e.g.,
planned to also include Java. The educator can choose a course reported errors of gcc are matched against predefined regular
and enter an exercise name and exercise number. The educators expressions to determine the error type. Cppcheck, however,
also have to define the folder which contains files needed for returns an XML file where each error has an attribute
test execution, e.g. dejagnu or cunit files which define containing the name of the error, so we simply have to look for
testcases, and the destination folder on a virtual machine (VM) the name in the database to obtain the error type. Each error is
where the testfiles and students solution has to be copied to stored in the database with the raw error message, the line it
and on which the dynamic tests will be executed. Furthermore, was found and the associated error type. Furthermore a boolean
they can choose which tests shall be executed for the exercise. field exists for each error, such that an educator can manually
The students solution are imported via a zip file with the mark an identified error as false positive.
following folder structure: Dynamic testing is performed in a virtual machine due to
student/submissionNo/ security reasons. We use cunit [25] for testing single functions
and dejagnu [26] for testing a complete program. Dejagnu is a
where student is a unique identifier for a student and framework for testing programs independently of the
submissionNo is the number of the solution submitted by this programming language by comparing a programs output to an
student, if multiple submissions are allowed. We need this expected output. If the execution of a test case takes longer
information when we want to investigate the evolution of a than 10 seconds, a timeout is reported. Since the expected
students programming experience. programs are very simple and should have a small execution

978-1-4673-8633-3/16/$31.00 2016 IEEE 10-13 April 2016, Abu Dhabi, UAE


2016 IEEE Global Engineering Education Conference (EDUCON)
Page 1099
time, a timeout indicates an infinite loop in most cases. The [5] M. Striewe and M. Goedicke, Visualizing Data Structures in an e-
result for each test case is stored in the database along with Learning System., in Proceedings of the 2nd International Conference
on Computer Supported Education, 2010, vol. 1, pp. 172179.
associated coverage information if the coverage option was
[6] J. C. Spohrer and E. Soloway, Novice mistakes: are the folk wisdoms
selected. For the coverage analysis we use gcov [27] and its correct?, Commun. ACM, vol. 29, no. 7, pp. 624632, 1986.
graphical frontend lcov [28]. [7] M. Hristova, A. Misra, M. Rutter, and R. Mercuri, Identifying and
After the evaluation of students solutions the tool creates a Correcting Java Programming Errors for Introductory Computer Science
Students, ACM SIGCSE Bull., vol. 35, no. 1, pp. 153156, 2003.
zip file containing an error report for each student. The report
[8] J. Jackson, M. Cobb, and C. Carver, Identifying Top Java Errors for
contains the results of the different tests. For the educational Novice Programmers, Front. Educ. 2005. FIE 05. Proc. 35th Annu.
staff, the tool provides statistics about common errors for the Conf., pp. T4C24T4C27, 2015.
current exercise as well as for the overall course. Furthermore, [9] M. Ahmadzadeh, D. Elliman, and C. Higgins, An Analysis of Patterns
educators can take a look at each result for any student for any of Debugging Among Novice Computer Science Students, ACM
exercise and check whether the reported errors are false SIGCSE Bull., vol. 37, no. 3, pp. 8488, 2005.
positives. [10] G. L. Natalie Coull, Ishbel Duncan, Jackie Archibald, Helping Novice
Programmers Interpret Compiler Error messages, in 4th Annual LTSN-
V. CONCLUSION AND FUTURE WORK ICS Conference, 2003.
[11] A. E. Fleury, Programming in Java: Student-Constructed Rules, in
In this paper, we have presented a framework which can be Proceedings of the Thirty-first SIGCSE Technical Symposium on
used for the automatic assessment of programming Computer Science Education, 2000, pp. 197201.
assignments, as well as for the identification of common [12] S. H. Edwards and V. Ly, Mining the Data in Programming
programming errors made by students, and the investigation of Assignments for Educational Research, in Proceedings of the
the evolution of programming experience. We have International Conference on Education and Information Systems:
implemented a prototype which is currently able to find syntax Technologies and Applications, 2007, pp. 135140.
and semantic errors by using compiler messages and static [13] A. J. Ko and B. A. Myers, Development and Evaluation of a Model of
Programming Errors, Proc. - 2003 IEEE Symp. Hum. Centric Comput.
analysis tools. Furthermore, programs are checked dynamically Lang. Environ., 2003.
by black box and unit test. We also implemented a
[14] Checkstyle. [Online]. Available: http://checkstyle.sourceforge.net/.
classification for the errors which are found by the different [Accessed: 16-Nov-2015].
tools. The classification makes it easier to investigate common [15] R. Jocham, JCSC - Java Coding Standard Checker. [Online].
errors and to provide useful hints to the students about the Available: http://jcsc.sourceforge.net/. [Accessed: 16-Nov-2015].
particular error. [16] Google, Google C++ Style Guide. [Online]. Available:
https://github.com/google/styleguide/tree/gh-pages/cpplint. [Accessed:
Further research will concentrate on the detection and 16-Nov-2015].
classification of logic errors. AnalyseC [29] is a first attempt to [17] R. E. Berry and B. a. E. Meekings, A style analysis of C programs,
localize logic errors in programming assignments. They use Commun. ACM, vol. 28, no. 1, pp. 8088, 1985.
standardization, control flow, and dependence information to [18] K. Ala-Mutka, T. Uimonen, and H.-M. Jarvinen, Supporting Students
compare a students solution with a model solution. In their in C++ Programming Courses with Automatic Program Style
case study 80% of the errors could be found. We plan to extend Assessment, J. Inf. Technol. Educ., vol. 3, no. 1, pp. 245262, 2004.
this approach by adding further information for the comparison [19] T. a. Budd and D. Angluin, Two Notions of Correctness and Their
like, e.g., coverage information or traces. AnalyseC does not Relation to Testing, Acta Inform., vol. 18, pp. 3145, 1982.
classify the found errors, so we need to develop patterns for the [20] I. Free Software Foundation, GCC, the GNU Compiler Collection.
classification of logic errors. Furthermore, we will investigate [Online]. Available: https://gcc.gnu.org/. [Accessed: 16-Nov-2015].
convenient similarity metrics for the clustering of students [21] clang: a C language family frontend for LLVM. [Online]. Available:
http://clang.llvm.org/. [Accessed: 16-Nov-2015].
solutions. A complete evaluation of the framework will be
[22] Splint - Annotation-Assisted Lightweight Static Checking. [Online].
made after using the framework in two years in the C course. Available: http://www.splint.org/. [Accessed: 16-Nov-2015].
Additionally, mining techniques can be used on the data, which
[23] Cppcheck - A tool for static C/C++ code analysis. [Online]. Available:
is gained with the framework, to investigate the evolution of http://cppcheck.sourceforge.net/. [Accessed: 16-Nov-2015].
students programming behavior, skills, and experience. [24] Verateam, vera++. [Online]. Available:
https://bitbucket.org/verateam/vera/wiki/Home. [Accessed: 17-Nov-
REFERENCES 2015].
[1] P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppl, Review of [25] CUnit - A Unit Testing Framework for C. [Online]. Available:
recent systems for automatic assessment of programming assignments, http://cunit.sourceforge.net/. [Accessed: 16-Nov-2015].
in Proceedings of the 10th Koli Calling International Conference on [26] DejaGnu. [Online]. Available: http://www.gnu.org/software/dejagnu/.
Computing Education Research, 2010, pp. 8693. [Accessed: 16-Nov-2015].
[2] D. Jackson and M. Usher, Grading student programs using ASSYST, [27] gcov - a Test Coverage Program. [Online]. Available:
in Proceedings of the twenty-eighth SIGCSE technical symposium on https://gcc.gnu.org/onlinedocs/gcc/Gcov.html. [Accessed: 16-Nov-
Computer science education, 1997, vol. 29, no. 1, pp. 335339. 2015].
[3] M. Goedicke, M. Striewe, and M. Balz, Computer Aided Assessments [28] LCOV - the LTP GCOV extension. [Online]. Available:
and Programming Exercises with JACK, 2008. http://ltp.sourceforge.net/coverage/lcov.php. [Accessed: 16-Nov-2015].
[4] M. Striewe and M. Goedicke, Using Run Time Traces in Automated [29] W. Wu, G. Li, Y. Sun, J. Wang, and T. Lai, AnalyseC: A Framework
Programming Tutoring, in Proceedings of the 16th Annual Joint for Assessing Students Programs at Structural and Semantic Level,
Conference on Innovation and Technology in Computer Science IEEE Int. Conf. Control Autom. ICCA, pp. 742747, 2007.
Education, 2011, pp. 303307.

978-1-4673-8633-3/16/$31.00 2016 IEEE 10-13 April 2016, Abu Dhabi, UAE


2016 IEEE Global Engineering Education Conference (EDUCON)
Page 1100

S-ar putea să vă placă și