Subscribe

If bridges and buildings were made like we make software, we would have disasters happening all around us. I have heard this being said, many times. It is sad but true. Buggy software is the bane of the software industry. One way to improve software quality is by proper education. Several professionals from the software industry also attest to this. They believe a greater emphasis should be given to quality and testing in university courses. But simply explaining the principles of software quality is not sufficient. Students tend to forget theoretical principles over time. Practical exposure and experience is equally important. Students should be put in an environment where they can appreciate the importance of quality software, and can experience the benefits of processes that enhance quality. Many universities have a period of internship for the students in which they work in a software company and experience these factors first hand. However because the internship usually is of a duration of 3-6 months, it is not sufficient to instill the importance of quality. Emphasis on code quality should be made a part of the entire software curriculum for it to have proper impact. Every assignment that the students submit should be subjected to the same quality standards that an industrial project would be subjected to.


Several efforts have been made to design and implement automated grading systems in universities. Some existing systems are:

In this post I will briefly explain two such automated grading systems - WEB-CAT, and Praktomat.

WEB-CAT

WEB-CAT was created at Virginia Tech university to address the need for incorporating software testing as an integral part of all programming courses. The creators realized the need to automate the grading of student assignments, to enable faster feedback to students and to balance the working load of faculty members.

Since Test Driven Development (TDD) was to be used for all the assignments, the students had to be graded not only on the quality of code, but also on the quality of their test suite. WEB-CAT grades students on three criteria. It gives each assignment a test validity score, a test correctness score, and a code correctness score. Test validity measures the accuracy of the students tests. It determines if the tests are consistent with the problem statement. Test coverage determines how much of the source code the tests cover. It determines if all paths and conditionals are adequately covered. Code correctness measures correctness of the actual code. All three criteria are given a certain weight-age and a final score is determined.

WEB-CAT’s graphical user interface is inspired by the unit testing tool JUnit. Just like JUnit it uses a green bar to show the test results. A text description containing details such as the number of tests that were run, and the number that passed is also provided.

Basic features provided by WEB-CAT are:

  • Submission of student assignments using a web based wizard interface
  • Submission of test cases using a web based wizard interface
  • Setup of assignments by faculty
  • Download of student scores by the faculty
  • Automatic grading with immediate feedback for student assignment


WEB-CAT follows a certain sequence of steps to assess a project submission. A submission is assessed only if it compiles successfully. If compilation fails, then a summary of errors is displayed to the user. If the program is compiled successfully then WEB-CAT will assess the project on various parameters. It first tests the correctness of the program by running the student’s tests against the program. Since these tests are submitted by the students, and it is expected that 100% of the tests will pass, because we do not expect students to submit a program that fails their own tests. After this the student’s test cases are validated by running them against a reference implementation of the project created by the instructor. If a student’s test case fails on the reference implementation then it is deemed to be invalid. Finally the coverage of the student’s test cases is evaluated. Once the scores are obtained a cumulative score out of 100 is calculated applying a certain formula on the scores from all criteria. The results are displayed immediately to the student on an HTML interface.

It was observed that the quality of student assignments increased significantly after using WEB-CAT. It was found that the code developed using WEB-CAT contained 45% fewer defects per 1000 (non commented) lines of code.

 

Praktomat

Praktomat was created at Universitat Passau in Germany. The purpose of creating Praktomat was to build an environment which would help students enhance the quality of their code. Along with automated grading it also has a focus on peer reviews. The creators of Praktomat felt that reviewing others software and having one’s software reviewed helps in producing better code. This is the reason why Praktomat has a strong focus on peer review and allows users to review as well as annotate code written by other students. Students can resubmit their code any number of times till the deadline. This way they can improve their code by adopting things they learned by reviewing other students code as well as lessons they learned by others feedback of their own code.

Praktomat evaluates student assignments by running them against a test suite provided by the faculty. The faculty creates two test suites – a public suite and a secret suite. The public suite is distributed to the students to help them validate their project. The secret test suite is not made available to the students, but they are aware of its existence. An assignment is evaluated by automatically running both the test suites against it, and also by manual examination by the faculty. Praktomat was developed in Python, and is hosted on SourceForge.


Conclusion

My contention that student project submissions should be backed by a process to encourage best practices, and a software to automate as well as facilitate the process, has become stronger after reviewing WEB-CAT and Praktomat.

What best practices should we incorporate in the process? What are the features that an automated grading software should contain? WEB-CAT, Praktomat, and several other software give a good starting point. We can learn from their successes and failures, and enhance the offering by adding our own experience.

WEB-CAT and several other sources have shown us that TDD is definitely a good practice. In a university environment TDD will work best if it is complemented by instant feedback to the students. We want to have a process that will encourage students to improve the quality of their code. They should be graded on the best code they can submit till the deadline. Two things are needed for this – instant feedback and the ability to resubmit assignments. WEB-CAT achieves this by assessing submissions in real time, and displaying the results to the students immediately. WEB-CAT allows students to re-submit assignments any number of time till the due date. Since faculty members are already overloaded with work, the software should take some of the faculties responsibilities. WEB-CAT automatically evaluates and grades the student’s assignments, leaving faculty with time for more meaningful activities.
 
Praktomat has shown us that there is a definite benefit to peer review. When we review code written by others, we can go beyond the paradigms set in our own mind. Having our code reviewed by others can help us see our shortcomings which we may have earlier overlooked.  Praktomat allows students to review code written by others. However the review is hidden from the faculty, to ensure that it does not impact grading. Praktomat does not rely on 100% automatic evaluation of the assignments. Praktomat evaluates certain aspects automatically and the rest are evaluated manually. Factors like code quality, documentation, etc are reviewed and evaluated manually by the faculty. There may be two reasons for this. Software to support automatic evaluation of these things may not have been available when Praktomat was written, or the creators felt that certain things are best evaluated by the faculty.
????

blog comments powered by Disqus